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(57) Abstract 

A distributed architecture parallel processing apparatus, includes a central microprocessor having at least one external interface 
connected to a similar interface of a neighboring parallel processor. The processors exchange data and control signals through the interfaces 
to cooperatively share m the execution of a program. An inter-processor status register in each processor maintains the current status of me 
processors. 



3/2/2007, EAST Version: 2.1.0.14 







FOR THE PURPOSES OP INFORMATION ONLY 






Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 


applications under the PCT. 










AM 


Armenia 


GB 


United Kingdom 


MW 


Malawi 


AT 


Austria 


GB 


Georgia 


MX 


Mexico 


AU 


Australia 


GN 


Guinea 


NE 


Niger 


BB 


Barbados 


GR 


Greece 


NL 


Ncihei lands 
Norway 


BE 


Belgium 


HU 


Hungary 


NO 


BF 


Burkina Puo 


IE 


Ireland 


NZ 


New Zealand 


BG 


Bulgaria 


IT 


Italy 


PL 


Poland 


BJ 


Benin 


JP 


Japan 


FT 
RO 


Portugal 


BR 


Brazil 


KB 


Kenya 


Romania 


BY 


Belarus 


KG 


Kyigystan 


RU 


Russian Federation 


CA 




KP 


Democratic People's Republic 


SD 


Sudan 


CF 


Central African Republic 




of Korea 


SE 


Sweden 


CG 


Congo 


KR 


Republic of Korea 


SG 


Singapore 


CH 


Switzerland 


KZ 


Kazakhstan 


SI 


Slovenia 


CI 


Cote d'lvcire 


U 




SK 


Slovakia 


CM 


Cameroon 


LK 


Sri Lanka 


SN 


Senegal 


CN 


China 


LR 


Liberia 


sz 


Swaziland 


cs 


CzecbosiovaUa 


LT 


Uduania 


TD 


Chad 


cz 


Czech Republic 


LU 


Luxembourg 


TG 


Togo 


DE 


Germany 
Denmark 


LV 


Latvia 


TJ 


Tajikotan 


DK 


MC 


Monaco 


TT 


Trinidad and Tobago 


EE 

ES 


Estonia 


MD 


Republic of Moldova 


UA 


Ukraine 


Spain 
Finland 


MG 


Madagascar 


UG 


Uganda 


FI 


ML 


Mali 


US 


United States of America 


FR 


Prance 


MN 


Mongolia 


UZ 


Uzbekistan 


GA 


Gabon 


MR 


Mauritania ' 


VN 


Viet Nam 



3/2/2007, EAST Version: 2.1.0.14 



WO 97/34226 



PCT/CA97/00164 



- 1 - 

SCALEABLE DOUBLE PARALLEL DIGITAL SIGNAL PROCESSOR 

This invention relates to digital processing apparatus, and more particularly to a digital 
processing apparatus having a distributed architecture. 

A classical Digital Signal Processor (DSP) has two major parts, namely a core 
architecture and the peripherals. The major blocks of the core architecture are the the 
Program / Data Memory; the Arithmetic / Logic Unit (ALU); the Multiplier / 
Accumulator (MAC); the Barrel Shifter (BS); the Data Address Generator (DAG); the 
Program Address Generator (PAG); the Registers (used to hold intermediary results, 
addresses, and speed up access to the previous five blocks), and the buses. 

Some of the peripheral blocks are the Serial Port(s); the Host Interface Port (parallel port), 
and Timers). Somewhere between these two blocks are the DMA controller; and the 
Interrupt(s) controller 

Various DSPs may use distinct ALU, MAC and BS computational blocks or may blend 
them into multifunctional units. 

The new generation of DSPs take advantage of newer technologies allowing faster 
clocking of old architectures and consequently higher processing power, faster memories 
that allow improvements in the internal architecture of various blocks, multiple internal 
buses, and new peripherals. 

One of the common problems associated with the traditional DSP architectures is the 
uneven loading of the processors in a multiprocessor design. To cope with this problem, 
more recently, new DSP architectures have been proposed and implemented that have 
parallel processing capabilities. 

At the heart of their design is the concept of inter-processor communication via external 
interface ports, globally shared memory, and shared buses. The complexity of these 
designs, however, translates into extremely high cost IC implementations. 

Parallel Computing (PC) increases processing power by permitting parallel processing at 
the routine (task) level. When a program has to execute two different routines that are 
independent at the data level (i.e. the data written by one routine is not read by the other 
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routine), the two routines can be executed in parallel. This is referred to herein as macro 
parallelism. 

Congestion can also occur at the instruction level. When a program has to execute a 
sequence of instructions that are independent, at data level, these instructions could be 
executed in parallel. Executing these instructions in parallel (herein referred to as micro 
parallelism) on the same processor, however, would require multiple buses and 
instruction words large enough to handle multiple operands. 

An object of the invention is alleviate this problem. 

According to the present invention there is provided digital processing apparatus 
comprising a microprocessor having at least one external interface for connection to a 
respective parallel processor having a similar interface, said interface permitting the 
exchange of data and control signals to permit said central processor and one or more 
parallel processors to cooperatively share in the execution of a program; and an inter- 
processor status register for maintaining the current status of said processors and said at 
least one parallel processor. 

The invention handles macro parallelism by allowing a processor to start a task (and be 
notified on its completion) on a neighboring parallel processor. 
The invention can also handle parallel processing of single instruction words (micro 
parallelism) without the need for multiple buses and the like. Instead of requiring a 
complex processor, the invention locks together multiple simpler processors to achieve a 
similar result, and at the same time obtain the benefit of the power of multiple processing 
units. When multiple processors are locked together, the instructions they execute can be 
seen as the equal length segments of a Large Instruction Word (LIW). Depending on how 
many processor are locked together, the length of the Large Instruction Word could vary. 

The invention thus permits the handling of micro parallelism through LIW. as well as 
macro parallelism through Parallel Computing. 

The invention thus employs a processor interface and changes to the architecture of a DSP 
that make both Parallel Computing and Large Instruction Word possible. The new 



3/2/2007, EAST Version: 2.1.0.14 



WO 97/34226 



PCT/CA97/00164 



-3- 

distributed processing architecture is particularly suited for the case when the processors 
share the silicon space of a single integrated circuit. 

The invention also provides a distributed architecture parallel processing apparatus, 
comprising a microprocessor having at least one external interface connected to a similar 
interface of a neighboring parallel processor, said processors exchanging data and control 
signals through said interfaces to cooperatively share in the execution of a program; and 
an inter-processor status register in each processor for maintaining the current status of 
said processors. 

The invention still further provides a method of executing a program comprising.the steps 
of providing at least two parallel processors, one said processor being a master and the or 
each remaining processor being a slave; interconnecting said processors through an 
external interface so that they can exchange data and control signals to cooperatively 
share in the execution of a program; and maintaining the status of the cooperating 
processors in an inter-processor status register provided therein. 
It should be understood that each processor in a multi-processor configuration has the 
potential to be a master/and or slave. For example, if processor A starts a job on processor 
B, A and B are in a master-slave relationship. However, B can "sub-contract" some part 
of the job to C, in which case B and C are in a master-slave relationship. B is a slave to A, 
but a master to C. At a different moment in time, which is software dependent, this 
relationship can totally reverse itself. 

BRIEF DESCRIPTION OF THE DRAWINGS 
The invention will now be described in more detail, by way of example, only with reference 
to the accompanying drawings, in which: - 

Figure 1 is a diagrammatic illustration of a microprocessor with an external interface in 
accordance with the invention; 

Figure 2 shows the organization of the inter-processor status register; 
Figure 3 shows the control and status lines of the interface in more detail; 
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Figure 4 shows the internal registers and bus structure of a processor in accordance with the 
invention; 

Figure 5 illustrates conflict resolution in a multiple processor system; and 

Figure 6 is a more detailed diagram explaining the architecture of a processor in accordance 
with the invention. 

Referring to figure I, the central digital signal processor 1 includes a program / data 
memory; an arithmetic / logic unit (ALU); a multiplier / accumulator (MAC); a barrel 
shifter (BS); a data address generator (DAG); a program address generator (PAG); 
registers for holding intermediate results, addresses, and speed up access to the previous 
five blocks); and buses. As these components are conventional, they are not illustrated in 
the drawings and will not be described in detail. 

The processor 1 also includes an interprocessor register 2 (IPSR) described in more detail 
with reference to Figure 2 and right and left register banks 3, 4, and central register 13. 
Right and left dual Port data memory 12, 13 provides a memory window accessible both 
to the central processor and the associated neighboring parallel processor. 

The central processor 1 has right and left external interfaces 5, 6 for communicating with 
respective parallel processors 7, 8 in a symmetrical scheme, referred to as the Left 
processor and Right processor. The external interface is presented in Figure 1 . The Left 
and Right Processors are similar microprocessors to the central processor and are not 
illustrated in detail. 

In the above scheme, the processor 1 is viewed as the 4 Middle processor*, having a similar 
left and a right neighbor presenting and controlling an identical interface. 

The external signals are separated in three main groups of signals 9, 10, 1 1 as shown in 
more detail in Figure 3, namely the Control and Status Lines - eight lines, (6 outgoing and 
two bi-directional as shown in more detail Figure 3 for details); bi-directional Data Bus 
Lines; the number of which is implementation dependent (16 in one embodiment); and bi- 
directional Register Select Lines, the number of which is implementation dependent (3 in 
one embodiment). 
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As shown in Figure 1, two adjacent processors share data through a dual port RAM 12, 
13, mapped in the data memory space of both processors, and via two banks of dual port 
registers (accessed from both internal Data Bus and external Left or Right Data Bus), each 
processor with its own set (see Figure 3). 

The central processor has an Inter-Processor Status register (IPSR) 2 that describes its 
state and functional mode with respect to the left and right processors. The IPSR register 
is shown in Figure 2. 

There are four possible states and thus two bits needed to describe: 

1. Independent 

2. Parallel Computing (PC) 

3. Large Instruction Word (LIW) 

4. Suspended 

There are 2 possible modes (1 bit needed): 

• Master 

• Slave 

A central processor can be in a Master mode with respect to both neighboring processors, 
or a Master mode with respect to one and a Slave mode with respect to the other, but it 
can never be in a Slave mode with respect to both (left and right) processors 
simultaneously. 

Any central processor can interrupt a left or/and right processors (status and interface line 
condition permitting) and bring it/them into a Master-Slave mode in which the Slave does 
work on behalf of the Master. 

Depending on the state and mode bits in the status register 2, a processor has various 
access rights to the dual port data memory window and to the register bank of the 
neighboring processor(s). Table 1 describes the access rights and the functionality of a 
processor based on the state and mode bits configuration. In Table 1 , the 'Symmetry state' 
column is used to label those situations where a symmetric situation could occur. 



Table 1 : Access rights and functionality based on status bits. 




Left Side Bits 
State Mode 


Right Side Bits 
Stale Mode 


Access 


Executed programs 


Symm. 
state 




Indep. 1 Master 


Restricted to its own rcgs. and 


Executes its own job 


NO 
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data space 








Indcp. 


Master 


PC 


Master 


Restricted to its own regs. and 
data space 


Executes its own job. 

Started || job on riehi processor 


YES 


Indep. 


Master 


PC 


Slave 


Own regs. and data space + 
RDMWA 1 ! 


Executes |) job on behalf of Right 
proc. 


YES 


Indcp. 


Master 


UW 


Master 


Own regs. and data + 
RRA 2 + RDMWA 


Executes own job locking Right 
proc 


YES 


Indcp. 


Master 


LIW 


Slave 


Own regs. and data + 
RRA + RDMWA 


Executes locked by 
Rijthl proc. 


YES 


indcp. 


Master 


Suspend 


Slave 


Own regs. and data + 
RRA + RDMWA 


PC is frozen while 
NOPs ure executed 


YES 


PC 


Masicr 


PC 


Master 


Restricted to its own regs. and 
data space 


Executes its own job. 
Started || jobs on left & right 
processors. 


NO 


PC 


Master 


PC 


Slave 


Own regs. and data + 
RDMWA 


Executes job on behalf of Right 
proc. 

Started H job on left. 


YES 


PC 


Master 


UW 


Master 


Own regs. and data + 
RRA 


Executes its own job locking 
Right proc. Started job on Lett 
proc. 


YES 


PC 


Master 


UW 


Stave 


Own regs. and data + 
RRA + RDMWA 


Executes locked by 
Right proc. Started 
job on Left proc. 


YES 


PC 


Master 


Suspend 


Slave 


Own regs. and data + 
RRA + RDMWA 


Started job on Left 
proc. Suspended 
while locked by right 


YES 


PC 


Slave 


UW 


Master 


Own regs. and data + 
LDMWA ■* RRA 


Executes job on behalf of Left 
proc.+ 

locking Right proc. 


YES 


PC 


Slave 


Suspend 


Master 


Own regs. and data + 
LDMWA + RRA 


Suspended while 
locking Right proc. 
Now executes! job 
tor Left processor. 


YES 




uw 


Masicr 


UW 


Master 


Own regs. and data + 
LRA J + RRA 


Executes its own job 
locking both Left 
and Right procs. 


NO 




LIW 


Master 


UW 


Slave 


Own regs. and data + 
LRA + RRA + 


Executes on behalf of and locked 
by 


YES 





RDMWA - Right processor Data Memory Window Access 

2 . RRA - Right processor Register Access 

3 . LRA - Left processor Register Access 
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RDMWA 


Right, locking Left. 




Suspend 


Master 


Suspend 


Stave 


Own regs. and data * 
LRA + RRA + 
RDMWA 


While in the above state has 
received (and passed to Left) the 
Suspend command 


YES 



The state and mode bits in the IPSR 2 uniquely determine the condition of the external 
interface status line. The mapping of the state and mode bits onto external status lines is 
given in Table 2. 



Tabic 2: Internal status bits to external status lines mapping 



Left Side Bits 


Right Side Bits 


Lefts 


talc lines 


Right state lines 


Symm. 


State Mode 




State 


Mode 


State 


Mode 


states 


Indcp. 


Master 




Master 


Indep. 


Master 


Indep. 


Master 


NO 






PC 


Master 


PC 


Master 


PC 


Master 


YES 


Indep. 


Master 


PC 


Slave 


PC 


Slave 


PC 


Slave 


YES 


Indcp 


Master 


UW 


Master 


UW 


Master 


UW 


Master 


YES 






Lrw 


Slave 


UW 


Slave 


LIW 


Slave 


YES 




Master 


Suspend 


Slave 


Suspend 


Slave 


Suspend 


Slave 


YES 


PC 


Master 


PC 




PC 




PC 


Master 


NO 


PC 


Master 


PC 


Slave 


PC 


Slave 


PC 


Slave 


YES 


PC 


Master 


uw 


Master 


uw 


Master 


uw 


Master 


YES 


PC 


Master 


LIW 


Slave 


uw 


Slave 


uw 


Slave 


YES 


PC 


Master 


Suspend 


Slave 


Suspend 


Slave 


Suspend 


Slave 


YES 


PC 


Stave 


LIW 


Master 


LIW 


Slave 


UW 


Slave 


YES 


PC 


Slave 


Suspend 


Master 


Suspend 


Slave 


Suspend 


Slave 


YES 


uw 




UW 


Master 


LIW 


Master 


UW 


Master 


NO 


uw 


Master 


UW 


Slave 


LIW 


Slave 


UW 


Slave 


YES 


Suspend 




Suspend 


Slave 


Suspend 


Slave 


Suspend 


Slave 


YES 



The possible actions of a processor with respect to the left/ right processors, based on its 
left/right status bits and external status lines and left/right processor status lines are given 
in Table 3. 



Table 3: Possible actions of a processor based on its status bits and external status lines 



Right Side Bits 
State Mode 



Indep. 



Master 



Right Side Lines 
State Mode 



Indep. 



Master 



Right Status Lines 
State Mode 



Indep. 



Master 



Possible actions 



force Right to PC 
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PC 
UW 




PC 




Force Right to UW 


Indcp 


Master 


Indcp. 
PC 


Master 


LIW 


Master 


Force Right to PC 


Indcp, 


Master 


PC 


Sieve 


Indcp. 
PC 


Master 


Force Right to PC 
Force Right to LIW 


Indcp. 


Master 


UW 


Stove 


Indcp. 
PC 


Master 


Force Right to PC 
Force Ri*ht to UW 


PC 


Slave 


PC 


Slave 


PC 


Master 


Report task completed 


UW 


Master 


UW 


Master 


LIW 


Slave 


Exit UW state 
(unlock) 



As will be apparent, there are four possible states and two possible modes. From all eight 
possible combinations only one is invalid, (Independent, Slave) combination. 

The two pairs of status bits in the IPSR 2 determine what is the relation of the processor 
with respect to the processor on that side. Only a combination of both sides status bits 
could determine the real state of the processor. 

Whenever a processor enters a Slave mode, almost all its registers get saved, such that the 
work can be resumed when the Master mode is re-entered. This can occur quickly with 
the use of shadow registers in this embodiment. 

The situation that arises in various valid combinations will now be described, although it 
will be apparent to one skilled in the art that other valid combinations are possible. 

1. (Independent, Master) 

A processor is in this state when the status bits on both sides of the IPSR 2 show it in this 
state. In this case the external status lines will show the same thing (see Table 2). 

In this state a processor executes code on behalf of itself and can access only its own 
registers and data memory. 

2. (Parallel Computing, Master) 

When one side of the IPSR register 2 shows this configuration and the other side shows 
the Independent-Master case, the central processor 1 is in a Master-Slave relationship 
with the processor on that side, has already started a parallel task on the processor on that 
side, and can check on the state of that task by polling the corresponding Task Completed 
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bit in IPSR 2 or by executing a Wait until Task Completed on Left/Right instruction. In 
this last case the processor will stay idle until the corresponding bit is set. 

In this state the processor has the same access right as in (Independent, Master) state. 
3. (Large Instruction Word, Master) 

When one side of the IPSR register 2, shows this configuration (while the other side 
shows the Independent-Master case), the central processor 1 is in a Master-Slave relation 
with the processor on that side, and has already locked to that processor to so as to 
process Large Instruction Words in parallel. The processor that has been locked can, in 
turn lock to another one, and so on in cascade. Whenever the LI W-Master processor 
jumps as a result of a control instruction (conditional/unconditional branches or looping 
instructions,) the take-the-branch condition is passed as a signal through the interfaces to 
all the processors locked in the chain. In this way, synchronized jumps are ensured, 
making assisted loop executions possible. When the processor executes a Release 
Left/Right processor instruction, the locked processor becomes unlocked and the Master 
can enter a state dependent on the status bits on the other side of IPSR 2. 

In this state, the processors have access not only to the dual port data memory window 
separating them from the Slave but also to the correspondent register bank of processor 
locked. The instruction set will be extended with instructions capable of accessing the left 
or right processor. 
4. (Suspended, master) 

Only one side of a processor can show this combination of state and mode bits. However, 
the status bits on the opposite side of IPSR determine what the processor really does. 

If the opposite status bits show (PC, Slave), the processor in fact is not suspended but is 
rather executing a parallel task forced by the processor on that side. Before being forced 
into a (PC, Slave) situation the processor was in a (LIW, Master) situation. When the 
switch occurred the processor had to suspend LIW activity itself and the processors 
locked up with it. 

If the opposite status bits show (LIW, Slave), the processor is in fact suspended. In this 
situation the processor has frozen its own PC and executes NOP instructions. Before 
being in this state the processor was in a (LIW, Slave) situation with one of its sides and 
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in a (LIW, Master) situation with the other side. The processor it has received a 
SUSPEND signal from the Slave side that it has past to the processor on the Master side. 
In this way, when the head of LIW link is suspended, all the processors in the chain will 
get suspended. 

5. (Parallel Computing, Slave) 

When one side of the IPSR register 2 shows this configuration (while the other side shows 
the Independent-Master case), the processor is in a Slave-Master relation with the 
processor on that side, on behalf of which it executes a task. The starting address of the 
task is passed to the processor when the Slave-Master relation has been established. At 
the end of the task, the processor executes an End-Of-Task instruction that gets locked in 
the corresponding status bits of the Master. When the End-Of-Task instruction is 
executed, the processor enters a state that is dependent on the status bits on the other side 
oftheIPSR2. 

In this state, a processor has access to its own registers and data memory space and to the 
dual port memory window into the data space of the Master processor. 

6. (Long Instruction Word, Slave) 

When one side of the IPSR register shows this configuration, (while the other side shows 
the Independent-Master case), the processor is in a Slave-Master relation with the 
processor on that side. In this situation, the processor still has the ability to put itself into a 
Master situation with respect to the processor on the other side. 
As mentioned before, when multiple processors run in a locked state, synchronism is 
essential. All processors should have the same master clock and they all should take (or 
not take) a conditional branch based on the decision of the Master processor. In this case, 
the Master drives the Jump interface line and all the Slaves in the chain execute a Branch 
on External Decision instruction that takes the jump based on the state of the line. 
A processor locked in a Slave mode has access not only to its own registers and data 
memory space but to the register banks of the other neighboring processor its running 
locked with and the dual port data memory windows into their data space. 
7. (Suspended, Slave) 
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In this case the processor that was locked executes only an NOP instruction, freezes the 
Program Counter (PC), and waits for the Release signal. 

The internal register access and structure of a central processor will now be described 
with reference to Figure 4. 

Data memory bus 20 is connected through multiplexers 21 to Left, Middle and Right 
registers 22, 23, 24 which in turn are connected through muliplexer 25 to processing unit 
26 including the ALU/MAC, BS, and DAG. Because any processor in this architecture is 
intemiptible, almost all internal registers except for the IPSR 2 should be shadowed. 
The MAC/ALU (Multiplier/Accumulator)architecture is shown in more detail in Figure 6, 
in which for brevity only the input data flow is shown. Left DMD bus 21 is connected 
through the interface to corresponding bus in the left processor 8. In operation, data flows 
from the left hand processor through MUX 22 to registers ALH, ALL (Accumulator Left 
High, Accumulator Right Low) from where it passes through Mux 23 to Multiplier and 
Accumulator and logic circuit 24, which is connected to the right barrel shifter 25. 
Similarly, data from the right processor 7 arrives over the right DMD bus 26 and passes 
through Mux 27, registers ARH, ARL, and Mux 28 to MAC unit 24. Internal bus 29 is 
connected through Mux units 30, 3 1 , 32, 33 to pairs of registers ALH, ALL; ARH, ARL; 
AAH. AAL; ABH, ABL connected through Mux 34 and left barrel shift register to MAC 
unit 24. It will be apparent that this arrangement allows instruction words to be shared 
between the adjacent processors. 

When a processor becomes slave to another processor, it uses the shadow registers to 
preserve the last contents of its registers as a Master. The shadow registers are back- 
propagated to the main registers when the processor re-enters a master mode (with respect 
to both left and right processor). 

For all three computational units (ALU, MAC and BS) a register relationship as presented 
in Figure 4 is valid. 

The ALU and the MAC require two operands (usually) while the BS requires only 1 . 
Depending on the architecture, the DAG requires 1 to 3 input registers. The set of 
registers available to a computational block is symmetrically divided into three groups, 
namely a set of n registers that can be loaded from their own DMD bus or some other 
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local bus, and two sets/banks of m registers that can be accessed not only from the local 
buses but from the adjacent Qeft or right) processors. 

The access to an internal register from the left or the right processor, in a symmetrical 
arrangement, is a significant aspect of the present invention. This change facilitates the 
taking advantage of the Large Instruction Word functional state. When one DSP can 
perform an operation on the already existent registers, the neighboring DSPs can use the 
additional buses to read/write access other internal registers. The dual port memory is 3 
used in this case to enhance the access of the neighboring DSPs to the data space of the 
middle processor. 

The m and n values should be relatively small (1 and 2 in one embodiment) because 
otherwise the propagation delays through various levels of multiplexing could add up to 
significant values. The totality of all registers accessible from the left (or right) processor 
forms the bank of registers used for communicating with the left (or right) processor. 

Because of the symmetry of the register distribution, similar banks of registers are 
available in the left and right processor, and as such, in any two processor LIW interaction 
two banks of registers will be always available for communication and speeding up each 
others computations when needed. 

The instruction set of a processor will be enhanced with instructions capable of addressing 
the left or right processor. These instructions are operational and useful only when a 
processor functions locked with another processor (in LIW state). 
Tables 4 to 19 present the state and mode transition. It should be noted that due to the 
symmetrical properties of the architecture, the cases that are not covered can be derived 
from those that are given. 



Tabic 4: Initial staius bits_ 



Left: Indep 



Master 



Right: 



Indep 



Master 



Action 



[nt : force Right to PC 



Int.: force Right to LIW 



Left status bits 
State Mode 



Indep. Master 



Indep. Master 



Right status bits 
State Mode 
PC Master 



LIW Master 



Regs 
state 
Saved 



Saved 



Left state lines 
Slate Mode 



PC Master 



UW Master 



Right state lines 
State Mode 



PC Master 



LIW Master 



Right: Enter PC 



Indep. Master 



PC 



Slave 



Saved 



PC Slave 



PC Slave 



Right: Enter LIW 



Indep. Master 



UW Slave 



Saved 



LIW Slave 



LIW Slave 
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Action 


Left status bits 
State Mode 


Right status bits 
Slate Mode 


Regs 
state 


Left suit lines 
State Mode 


Right state lines 
State Mode 


Int.: force Left to PC 


PC Master 


PC Master 


Saved 


PC Master 


PC Master 


Int.: force Left to LI W 


LIW Master 


PC Master 


Saved 


LIW Master 


UW Master 






Indep. Master 


Saved 


Indep. Master 


Indep. Master 


Left: Enter PC 


PC Slave 


■PC Master 


Saved 


PC Stave 


PC Slave 


Left: Enter UW , 


UW Slave 


PC Master 


Saved 


LIW Slave 


LIW Slave 



Action 


Left status bits 


Right status bits 


Regs 
state 


Left state lines 
Stale Mode 


' Right state lines 
State Mode 


Int.: force Lett to UW 


LIW Master 


PC Slave 


Saved 


UW Slave 


UW Slave 


Int.: force Left to PC 


PC Master 


PC Slave 


Saved 


PC Slave 


PC Slave 


Int.: task completed 


Indep. Master 


Indep. Master 


Saved 


Indep. Master 


Indep. Master 



Table 7: initial status pus 
Action 


Left status bits 
State Mode 


rx.fc^. ^ v — ■ — 

Right status bits 
State Mode 


Regs 
state 


Left state lines 
State Mode 


Right state lines 
State Mode 


Int.: force Left to UW 


UW Master 


UW Master 


Saved 


UW Master 


UW Master 


Int.: force Left to PC 


PC Master 


UW Master 


Saved 


PC Master 


UW Master 


Rteht: exitUW 


Indep. Master 


Indep. Master 


Saved 


Indep. Master 


tndep. Master 


Left: enter PC 


PC Slave 


Sosp Master 


Saved 


PC Slave 


Susp. Slave 



Table 8: Initial status bits Left:lndep. Master 


*ight:UW Slave 




Action 


Left status bits 
State Mode 


Right status bits 
State Mode 


Regs 
state 


Left state lines 
State Mode 


Right state lines 
State Made 


Int.: force Left to UW 


LIW Master 


LIW Slave 


Saved 


UW Slave 


LIW Slave 


Int.: force Left to PC . 


PC Master 


LIW Slave 


Saved 


UW Slave 


UW Slave 


Right: exit LIW 


Indep. Master 


Indep. Master 


Saved 


Indep. Master 


Indep. Master _ 


Right: suspend 


Indep. Master 


Susp. Slave 


Saved 


Susp. Slave 


Susp. Slave 



Action 


Lett status bits 
State Mode 


Right status bits 
State Mode 


Regs 
stale 


Left state lines 
State Mode 


Right state lines 
Slate Mode 


Right: exit Suspend 


Indep. Master 


UW. Slave 


Saved 


UW. Slave 


LIW. Stave 



Tabic 10: Initial status bits Lett: PC Master Right: PC Master 



3/2/2007, EAST Version: 2.1.0.14 



WO 97/34226 



PCT/CA97/00164 



-14- 



Action 


Left status bits 
State Mode 


Right status bits 
Stoic Mode 


Regs 
state 


Left state lines 
State Mode 


Right state lines 
State Mode 


Left: task completed 


Indep. Master 


PC Master 


Saved 


PC Master 


PC Master 


Right: task completed 


PC Master 


Indep. Master 


Saved 


PC Master 


PC Master 



Table II . Initial status bits Lett: PC Master Right: PC Slave 



Action 


Left status bits 
State Mode 


Right status bits 
State Mode 


Regs 
state 


Lett state lines 
State Mode 


Right state lines 
State Mode 


Lett: task completed 


Indep. Master 


PC Slave 


Saved 


PC Slave 


PC Slave 




PC Master 


Indep. Master 


Saved 


PC Master 


PC Master 



Table 12: Initial status bits 


Left: PC Master Rig 


ht: LIW Master 






Action 


Left status bits 
State Mode 


Right status bits 
State Mode 


Regs 
state 


Left state lutes 
State Mode 


Right state lines 
State Mode 




Indep. Master 


UW Master 


Saved 


UW Master 


UW. Master 


Int.: exit UW (unlock) 


PC Master 


Indep. Master 


Saved 


PC Master 


PC Master 




Table 13: Initial status bits 


Left: PC Master Ri 


zht: UW Stave 








Action 


Left status bits 
State Mode 


Right status bits 
State Mode 


Regs 
state 


Left state lines 
State Mode 


Right stale lines 
State Mode 


Left: task completed _j 


Indep. Master 


LIW Slave 


Saved 


LIW. Slave 


UW. Slave 




PC Master 


Susp. Slave 


Saved 


Susp. Slave 


Susp. Slave 


Riflht: exit LIW 


PC Master 


Indep. Master 


Saved 


PC Master 


PC Master 


Table 14: Initial status bits 


U 


ft: PC Master Right: Suspend Slave 




Action 


Left status bits 
State Mode 


Right status bits 
Slate Mode 


Regs 
state 


Lett state lines 
State Mode 


Right state lines 
State Mode 


Right: exit Suspend 


PC Master 


UW Slave 


Saved 


UW Stave 


UW Stave 


Left: task comofcled 


Indep. Master 


Susp. Slave 


Saved 


Susp. Slave 


Susp. Stave 


Table 1 S: Initial status bits 


Left: 


PC Slave Rif 


J«: LIW Master 




Action 


Left status bits 
State Mode 


Right status bits 
State Mode 


Regs 
state 


Lett state lines 
State Mode 


Right state lines 
State Mode 


Int.: task completed 


Indep. Master 


Indep. Master 


Saved 


Indep. Master 


Indep. Master 


Int.: exit LIW (unlock) 


PC Slave 


Indep Master 


Saved 


PC Slave 


PC Slave 



Table 16: Initial status Oils lc 
Action 


Left status bits 


Right status bits 


Regs 


Left state lines 


Right state lines 




State Mode 


State Mode 


state 


State Mode 


State Mode 
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Indep. 



15 



Master 1 UW Master 



Saved I UW Master 



3 



1LIW Master 



Action 


Left status bits 
Stale Mode 


Right status bits 
State Mode 


Regs 
state 


Lett state lines 
State Mode 


Right state lints 
State Mode 


Int.: exit UW Utt 


Indep. Master 


LIW Master 


Saved 


UW. Master 


UW. Master 


Int.: exit LIW Right 


UW Master 


Indep. Master 


Saved 


UW Master 


LIW Master 



Action 


Left si 


talus bits 


Right status bits 


Regs 


Left state lines 


Right state lines 








State Mode 


state 


State Mode 


State Mode 


Int.: exit UW Left 




Master 


UW. Slave 


Saved 


UW. Slave 


UW. Slave 


Right: exit UW 




Master 


Indep. Master 


Saved 


Indep. Master 


Indep. Master 






Slave 




Saved 


Susp. Slave 


Susp. Slave 



Action 


Left status bits 
Slate Mode 


Right status bits 
State Mode 


Regs 
state 


Left state lines 
Stale Mode 


Right state lines 
State Mode 




UW Master 


UW Slave 


Saved 


LIW Slave 


UW Slave 



The following table present all the software commands required to perform the various 
actions described in the previous tables. 



Table 20 



Command 


Desajntkm_ 


XTR 


address 


cXecute Task starting at address' on Right processor 


XTL 


address 


eXecute Task starting at 'address* on Left processor 


LCKR 




LoCK Right processor (force right to UW state) starting at address- 


LCK.L 


address 


LoCK Left processor (force left to UW state) starting at 'address* 


EOT 


End Of Task (reported to the processor on the slave side) 


RELR 


RE Lease (unlock) Right processor 


RK11 


RELease (unlock) Left processor 


BED 


address 


Branch on External Decision 


WTCL 


Wait for Task Completed on Left processor 


WTCR 


Wait for Task Completed on Right processor 
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In one embodiment, the first four instructions in Table 20 (XTR,XTL,LCKR,LCK1) are 
blocking. This ensures that if the processor they are trying to bring to a Master-Slave 
relation is in a state that does not permit the desired state transition, then the processor 
will enter a state where it will keep on trying to execute the mentioned instructions. In a 
different embodiment, these instructions can be made non blocking. In this situation, the 
program needs code that is compatible with a successful attempt and code that is 
compatible with a failed attempt. 

Besides the specific instructions given in the table, some of the usual instructions of a 
DSP are extended to handle external register bank access rights. 

The instructions XTR^XTL,LCKR,LCK require at least two cycles to execute. During the 
first cycle, the processor executing one of these instructions will try, based on its own 
status bits and other processor status lines, to force a neighboring processor into a Slave 
situation. If this attempt is successful, during the second cycle an address will be passed 
over the Data Bus lines to the other processor. In many cases, a third cycle is required for 
the second processor to fetch the instruction found at the address passed. 
A conflict arises when two processors attempt to put each other in a Master-Slave relation 
simultaneously. One solution to this situation is to always give priority to the processor on 
the right side of the couple. To solve this conflict, in one embodiment, an extra interface 
line is added (the ACKnowledgment line) and an Arbitration block that is biased to the 
right. This arrangement is shown in Figure 5, where central processor 1 is shown 
connected to Right and Left processors 7, 8.The IPSR 2 of each processor has an 
arbitration block 30. 

Where the software can guarantee that such conflicts do not occur, the Arbitration block 
and the additional interface line are not required. 

The present invention thus offers a powerful technique for evenly distributing the 
processing power of complex applications over multiple DSPs, using Parallel Computing 
and Large Instruction Word methods, which can be of variable length. 
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Because of the processing power and additional buses made available by multiple 
processors through this new distributed architecture method, it can be used with slower 
master clocks or slower memories. 

The new distributed architecture is particularly suited for the case where the processors 
are sharing the silicon space of the same integrated circuit. 

Due to its symmetrical properties, the distributed architecture can be easily scaled up to 
provide the necessaiy computational power for very complex DSP tasks even at low 
master clock rates or slow memory access time. 
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We claim: 

1 . Digital processing apparatus charcterized in that it comprises: 

a) a microprocessor having at least one external interface for connection to a 
respective parallel processor having a similar interface, said interface permitting the 
exchange of data and control signals to permit said central processor and one or more 
parallel processors to cooperatively share in the execution of a program; and 

b) an inter-processor status register for maintaining the cun-ent status of said 
processors and said at least one parallel processor. 

2. Digital processing apparatus as claimed in claim 1 , characterized in that said 
interface permits the exchange of signals and accessing of internal registers of a 
neighboring processor so that said processors can cooperatively share in the execution of 
a single instruction represented by a large instruction word. 

3. Digital processing apparatus as claimed in claim 1 or claim 2, characterized in that 
said microprocessor includes dual-ported memory that can be mapped into the data 
memory space of said microprocessor and an adjacent said parallel processor to provide a 
window between said adjacent processors. 

4. Digital processing apparatus as claimed in any one of claims I to 3, characterized 
in that said interface includes control and status lines, data bus lines, and register select 
lines. 

5. Digital processing apparatus as claimed in any one of claims 1 to 4, characterized 
in that said inter-processor status register includes for each said parallel processor a 
memory cell storing the processing state of the processor, the memory cell storing the 
current mode of operation, and a memory cell storing the state of completion of a current 
task. 

6. Digital processing apparatus as claimed in as claimed in any one of claims 1 to 5, 
characterized in that said interface is operative to permit the exchange control and data 
signals to permit the parallel execution in each processor of sequences of separate 
instructions forming independent routines. 
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7. Digital processing apparatus as claimed in as claimed in any one of claims 1 to 6, 
characterized in that said interface includes a jump line to send a signal to the or each 
cooperating parallel processor so that when said microprocessor encounters a jump 
instruction, the or each said parallel processor also executes a jump so as to make loop 
executions possible. 

8. Digital processing apparatus as claimed in claim 8, characterized in that said 
processors include an arbitration unit and said interface includes an acknowledgment line 
so as to permit conflict resolution between cooperating processors. 

9. A distributed architecture parallel processing apparatus, characterized in that it 
comprises a microprocessor having at least one external interface connected to a similar 
interface of a neighboring parallel processor, said processors exchanging data and control 
signals through said interfaces to cooperatively share in the execution of a program; and 
an inter-processor status register in each processor for maintaining the current status of 
said processors. 

1 0. A distributed architecture parallel processing apparatus as claimed in claim 9, 
characterized in that adjacent said processors include dual-ported memory to share a 
common address space mapped to each processor so as to provide a memory window 
therebetween. 

11. A distributed architecture parallel processing apparatus as claimed in claim 9 or 
claim 10, characterized in that it includes control and status lines, and data bus lines. 

12. A distributed architecture parallel processing apparatus as claimed in claim 1 1, 
characterized in that said interface means further includes a jump line for sending a signal 
to an adjacent cooperating parallel processor so that when a master processor encounters a 
jump instruction, the or each cooperating parallel processor will jump in synchronism to 
permit assisted loop executions. 

13. A distributed architecture parallel processing apparatus as claimed in claim 1 1 or 
claim 1 2, characterized in that said processors include an arbitration unit and said 
interface means further includes an acknowledgment line so as to permit conflict 
resolution between cooperating processors. 
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14. A distributed architecture parallel processing apparatus as claimed in any of 
claims 9 to 14, characterized in that any of said processors can be in a master mode and 
any of the remaining processors can be in a slave mode relative to said processor in the 
master mode. 

15. A distributed architecture parallel processing apparatus as claimed in claim 1 4, 
characterized in that said processors are provided on a common integrated circuit. 

16. A distributed architecture parallel processing apparatus as claimed in claim 1 5, 
characterized in that said processors include internal registers that are shadowed, and 
arranged such that when a master processor becomes a slave to another processor the last 
contents of the register in the master mode are preserved in shadow memory. 

17. A method of executing a program characterized in that it comprises the steps of: 

a) providing at least two parallel processors, one said processor being a master and 
the or each remaining processor being a slave; 

b) interconnecting said processors through an external interface so that they can 
exchange data and control signals to cooperatively share in the execution of a program; 
and 

c) maintaining the status of the cooperating processors in a inter-processor status 
register provided therein. 

1 8. A method as claimed in claim 1 7, characterized in that the execution of a single 
instruction defined by a large instruction word is shared between the cooperating 
processors. 

1 9. A method as claimed in claim 1 7 or claim 1 8, characterized in that said 
cooperating processors are further capable of sharing the execution of a program task, 
each executing an independent sequence of program instructions. 

20. A method as claimed in claim 1 8, characterized in that neighboring said 
processors share a common address space through a dual-ported memory. 

21 . A method as claimed in claim 20, characterized in that one of said processors 
serves as a master and the or each parallel processor serves as a slave. 
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22. A method as claimed in claim 2 1 , characterized in that said processors are 
synchronized over a jump line through said interface so that when the master executes a 
program jump, the or each slave processor executes a program jump in synchronism 
therewith to permit assisted loop executions. 

23 . Digital processing apparatus comprising: 

a) a microprocessor having at least one external interface for connection to a 
respective parallel processor having a similar interface, said interface permitting the 
exchange of data and control signals to permit said central processor and one or more 
parallel processors to cooperatively share in the execution of a program; and 

b) means for maintaining the current status of said processors and said at least one 
parallel processor. 
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